Run-Time Reference Clustering for Cache Performance Optimization

نویسندگان

  • Wesley K. Kaplow
  • Boleslaw K. Szymanski
  • Peter Tannenbaum
  • Viktor K. Decyk
چکیده

We introduce a method for improving the cache performance of irregular computations in which data are referenced through run-time defined indirection arrays. Such computations often arise in scientific problems. The presented method, called Run-Time Reference Clustering (RTRC), is a run-time analog of a compile-time blocking used for dense matrix problems. RTRC uses the data partitioning and re-mapping techniques that are a part of distributed memory multi-processor codes designed to minimize interprocessor communication. Re-mapping each set of local data decreases cache-misses the same way remapping the global data decreases off-processor references. We demonstrate the applicability and performance of the RTRC technique on several prevalent applications: Sparse Matrix-Vector Multiply, Particle-In-Cell, and CHARMMlike codes. Performance results on SPARC-20, SP-2, and T3-D processors show that single node execution performance can be improved by as much as 35%.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Optimizing DDA Code on a POWER5 Processor

In this paper we take an existing scientific computation code, DDA, and optimize it to run on an IBM Power5 processor. The DDA code, originally developed by a Ph.D. candidate in physics, suffers from excessive execution time caused by a high number of cache accesses and a low rate of instructions per cycle. Our goal is to improve the code’s performance by making a series of optimizations in a s...

متن کامل

Energy Optimization Using Object Co-Location in Java

With the paradigm shift in computer systems towards ubiquitous computing, energy, together with performance, has become an important parameter to measure e ciency. Java is increasingly becoming the programming language of choice for applications expected to run in embedded and mobile environments. Java's platform independence and security features serve the needs of these environments very well...

متن کامل

Cop -cache Optimization Tools for Scientiic Computing

The technological improvements in silicon manufacturing are yielding vast increases of processor's speed and memory chip capacity. At the same time, the main memory access time is experiencing modest gains, creating a dramatic disparity between the processor's clock and the access time of main memory. These developments make the e ective use of the cache memory paramount to overall program e ci...

متن کامل

A memory-layout oriented run-time technique for locality optimization

Exploiting locality at run-time is a complementary approach to a compiler approach for those applications with dynamic memory access patterns. This paper proposes a memory-layout oriented approach to exploit cache locality for parallel loops at run-time on Symmetric Multi-Processor (SMP) systems. Guided by applicationdependent hints and the targeted cache architecture, it reorganizes and partit...

متن کامل

Memory Data Organization for Improved Cache Performance in Embedded Processor Applications PREETI RANJAN PANDA and NIKIL

Code generation for embedded processors opens up the possibility for several performance optimization techniques that have been ignored by traditional compilers due to compilation time constraints. We present techniques that take into account the parameters of the data caches for organizing scalar and array variables declared in embedded code into memory, with the objective of improving data ca...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007